Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Percolate query optimization: Fetch fields mentioned in queries instead of entire doc and batch percolate query by heap-based threshold #1331

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

eirsep
Copy link
Member

@eirsep eirsep commented Dec 20, 2023

Issue #, if available:
#1353
#1367

Description of changes:

This PR optimizes scalability of percolate query performed in doc level monitors.

Status quo behaviour

The behaviour is to query and collect data from all shards of each index and perform a percolate query on the docs.
Hence the minimum number of queries performed in each execution is equal to number of concrete indices being queried.
This was not scaling as the data node executing the doc level would exceed heap memory limits when docs held in memory from all shards in the index exceed memory.

New behaviour introduced in the PR

We introduce a setting plugins.alerting.monitor.percolate_query_docs_size_memory_percentage_limit that allows us to determine the maximum size of the source data in memory allowed before we need to perform a percolate query.
That way the minimum number of percolate queries is 1 irrespective of number of concrete indices being queried and the number of percolate queries performed in a given execution is now more deterministic as it's a function of the heap size of the data node and won't be per concrete index.

CheckList:

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

matchingDocIdsPerIndex?.get(concreteIndexName),
monitorMetadata,
inputRunResults,
docsToQueries
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we keep collecting doc ids to queries here & run 1 finding creation, trigger execution workflow https://github.com/opensearch-project/alerting/pull/1331/files#diff-68866b22ed9703814b4d5db8d3488872bcb972086ecaca10c9b8bfd54db981bcR237.
Do we also execute finding creation, trigger execution workflow at each shard level?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trigger execution workflows needs all results.
Findings we can brainstorm if it needs to be done at shard level
But currently findings are being indexed one by one. A bigger and more important optimization would be to ingest them batch wise using bulk requests: #1333

@eirsep
Copy link
Member Author

eirsep commented Dec 22, 2023

will explore how I can factor in available free memory at that point in time and we can decide on the acceptable fraction of the heap size and if the data accumulated is crossing that threshold we can submit the percolate query with that many docs.
this way we can do for multiple indices also at once as long as it is lesser than threshold and choose executing another percolate query more deterministically

@eirsep eirsep marked this pull request as draft December 22, 2023 23:48
@eirsep eirsep changed the title Percolate query optimization: Perform percolate query once for each shard's documents Percolate query optimization: Perform percolate query when docs in memory breach threshold Jan 4, 2024
@eirsep eirsep changed the title Percolate query optimization: Perform percolate query when docs in memory breach threshold Percolate query optimization: Perform percolate query when docs in memory breach threshold setting for batching Jan 4, 2024
@eirsep eirsep force-pushed the percolate_per_shard branch from f150174 to cb92f54 Compare January 4, 2024 03:05
Copy link

@asuresh8 asuresh8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why no unit tests?

@@ -219,41 +233,39 @@ object DocumentLevelMonitorRunner : MonitorRunner() {
// Prepare DocumentExecutionContext for each index
val docExecutionContext = DocumentExecutionContext(queries, indexLastRunContext, indexUpdatedRunContext)

val matchingDocs = getMatchingDocs(
fetchShardDataAndMaybeExecutePercolateQueries(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method has a lot of parameters. Consider using an object as the parameter. That way a builder could be used so that the order of arguments does not matter. Would also make this more readable.

}
}
/* if all indices are covered still in-memory docs size limit is not breached we would need to submit
the percolate query at the end*/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit indentation looks off.

*
*/
private fun isInMemoryDocsSizeExceedingMemoryLimit(docsBytesSize: Long, monitorCtx: MonitorRunnerExecutionContext): Boolean {
var thresholdPercentage = PERCOLATE_QUERY_DOCS_SIZE_MEMORY_PERCENTAGE_LIMIT.get(monitorCtx.settings)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems inefficient to compute the threshold every time this method is called. Is it possible to compute the threshold only when the monitor is initialized?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

threshold is a dynamic setting so we need to fetch the latest value everytime we evaluate. opensearch does soms setting caching in the background so we are not essentially computing the threshold

if (thresholdPercentage > 100 || thresholdPercentage < 0) {
thresholdPercentage = PERCOLATE_QUERY_DOCS_SIZE_MEMORY_PERCENTAGE_LIMIT.getDefault(monitorCtx.settings)
}
val heapMaxBytes = monitorCtx.jvmStats!!.mem.heapMax.bytes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double ! is not safe. Consider replacing this with the appropriate null check.

*/
private fun isInMemoryDocsSizeExceedingMemoryLimit(docsBytesSize: Long, monitorCtx: MonitorRunnerExecutionContext): Boolean {
var thresholdPercentage = PERCOLATE_QUERY_DOCS_SIZE_MEMORY_PERCENTAGE_LIMIT.get(monitorCtx.settings)
if (thresholdPercentage > 100 || thresholdPercentage < 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be validated when the setting is set rather than here?

* level monitor execution. The docs are being collected from searching on shards of indices mentioned in the
* monitor input indices field.
*/
val PERCOLATE_QUERY_DOCS_SIZE_MEMORY_PERCENTAGE_LIMIT = Setting.intSetting(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: MEMORY -> HEAP

}
} finally { // no catch block because exception is caught and handled in runMonitor() class
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the need to wrap in try/finally if there's nothing being caught?

Comment on lines +712 to +745
inputRunResults.getOrPut(id) { mutableSetOf() }.add(docIndex)
docsToQueries.getOrPut(docIndex) { mutableListOf() }.add(id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: passing a collection to another function so it can be modified can be error prone. What do you think about returning these collections from this method as fields in a object then populating the original inputRunResults and docsToQueries collections with the return value of this method?

Comment on lines +717 to +750
transformedDocs.clear()
docsSizeInBytes.set(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment on passing to another function then mutating

@eirsep
Copy link
Member Author

eirsep commented Jan 10, 2024

@sbcd90
Should we also add another condition to do batching of percolate query based on num docs and limit to 100k per search at max along with the memory limit.

That way both conditions are evaluated to decide if we want to perform percolate query on current doc set

@eirsep eirsep force-pushed the percolate_per_shard branch from e09669f to 27a6ec4 Compare January 11, 2024 07:36
… each shard instead of performing one percolate query on docs from all shards

Signed-off-by: Surya Sashank Nistala <[email protected]>
Signed-off-by: Surya Sashank Nistala <[email protected]>
Signed-off-by: Surya Sashank Nistala <[email protected]>
… percolate query

Signed-off-by: Surya Sashank Nistala <[email protected]>
…end of collecting data from all indices

Signed-off-by: Surya Sashank Nistala <[email protected]>
…ex is a text field in query index mapping

Signed-off-by: Surya Sashank Nistala <[email protected]>
… query should be performed immediately

Signed-off-by: Surya Sashank Nistala <[email protected]>
Signed-off-by: Surya Sashank Nistala <[email protected]>
…monitor to submit to percolate query instead of docs_source

Signed-off-by: Surya Sashank Nistala <[email protected]>
@eirsep eirsep force-pushed the percolate_per_shard branch from b10e56e to 009cd61 Compare January 25, 2024 01:23
@eirsep eirsep changed the title Percolate query optimization: Perform percolate query when docs in memory breach threshold setting for batching Percolate query optimization: Fetch fields mentioned in queries instead of entire doc and batch percolate query by heap-based threshold Jan 31, 2024
Signed-off-by: Surya Sashank Nistala <[email protected]>
@eirsep eirsep force-pushed the percolate_per_shard branch from b439559 to da6afbc Compare February 8, 2024 23:53
Signed-off-by: Surya Sashank Nistala <[email protected]>
…as is configured

Signed-off-by: Surya Sashank Nistala <[email protected]>
@eirsep
Copy link
Member Author

eirsep commented Feb 19, 2024

Closing in favor of smaller PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants